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Abstract. Caching (also known as paging) is a classical problem con¬ 
cerning page replacement policies in two-level memory systems. General 
caching is the variant with pages of different sizes and fault costs. We 
give the first NP-hardness result for general caching with small pages: 
General caching is (strongly) NP-hard even when page sizes are lim¬ 
ited to {1,2,3}. It holds already in the fault model (each page has unit 
fault cost) as well as in the bit model (each page has the same fault cost 
as size). We also give a very short proof of the strong NP-hardness of 
general caching with page sizes restricted to {1, 2, 3} and arbitrary costs. 
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1 Introduction 

Caching (also known as uniform caching or paging) is a classical problem in 
the area of online algorithms and has been extensively studied since 1960s. It 
models a two-level memory system: There is the fast memory of size C (the 
cache ) and a slow but large main memory where all data reside. The problem 
instance comprises a sequence of requests, each demanding a page from the main 
memory. No cost is incurred if the requested page is present in the cache (a cache 
hit). If the requested page is not present in the cache (a cache fault), the page 
must be loaded at the fault cost of one; some page must be evicted to make 
space for the new one when there are already C pages in the cache. The natural 
objective is to evict pages in such a way that the total fault cost is minimized. 
For a reference on classical results, see Borodin and El-Yaniv [6]. 

In 1990s, with the advent of World Wide Web, a generalized variant called 
file caching or simply general caching was studied [9,10]. In this setting, each 
page p has its SlZE(p) and COST(p). It costs COST(p) to load this page into the 
cache and the page occupies SlZE(p) units of memory there. Uniform caching 
is the special case satisfying SlZE(p) = COST(p) = 1 for every page p. Other 
important cases of this general model are 
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— the cost model (weighted caching ): SlZE(p) = 1 for every page p\ 

— the bit model: COST(p) = SlZE(p) for every page p; 

— the fault model: COST(p) = 1 for every page p. 

General caching is still a thriving area of research as may be documented by the 
fact that a randomized online algorithm with the optimal asymptotic competitive 
ratio 0(\ogC) was given only recently by Adamaszek et al. [1]; this article may 
also serve as a reference on the current state of knowledge. 

Caching, as described so far, requires the service to load the requested page 
when a fault occurs, which is known as caching under the forced policy. Allowing 
the service to pay the fault cost without actually loading the requested page to 
the cache gives another useful and studied variant of caching, the optional policy. 
Previous work. In this article, we consider the problem of finding the optimal 
service in the offline version of caching when the whole request sequence is 
known in advance. Uniform caching is solvable in polynomial time with a natural 
algorithm known as Belady’s rule [5]. Caching in the cost model is a special 
case of the k-server problem and is also solvable in polynomial time [7]. In late 
1990s, the questions about the complexity status of general caching were raised. 
The situation was summed up by Albers et al. [2]: “The hardness results for 
caching problems are very inconclusive. The NP -hardness result for the Bit model 
uses a reduction from PARTITION, which has pseudopolynomial algorithms. Thus 
a similar algorithm may well exist for the Bit model. We do not know whether 
computing the optimum in the Fault model is NP -hard. ” 

There was no improvement until a breakthrough in 2010 when Chrobak et 
al. [8] showed that general caching is strongly NP-hard, already in the case of 
the fault model as well as in the case of the bit model. General caching is usually 
studied under the assumption that the largest page size is very small in compar¬ 
ison with the total cache size, as is for example the case of the aforementioned 
article by Albers et al. [2]. Instances of caching with pages larger than half of 
the cache size (so called obstacles) are required in the proof given by Chrobak 
et al. Therefore, this hardness result is in fact still quite inconclusive. 
Contribution. We give a novel proof of strong NP-hardness for general caching 
which gives the first hardness result restricted to small pages: 

Theorem 1.1. General caching is strongly NP-hard even in the case when the 
page sizes are limited to {1, 2, 3}, for both the fault model and the bit model, and 
under each of the forced and optional policies. 

The proof of the result for general costs (and sizes {1,2,3}) is rather simple, 
in particular significantly simpler than the one given by Chrobak et al. [8]. 
The reductions for the result in the fault and bit models are significantly more 
involved and require a non-trivial potential-function-like argument. 

Relation to interval packing. As observed by Chrobak et al. [8], there is 
a tight connection between caching under the optional policy and the problem 
called interval packing. A set of weighted intervals together with a limit W is 
given in the problem and the target is either to (a) choose a subset of intervals of 
the maximum cardinality such that at each point the total weight of intersected 
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intervals is limited by W (corresponds to caching in the fault model), or (b) 
choose a subset of intervals of the maximum total weight such that at each point 
the total weight of intersected intervals is limited by W (corresponds to the 
bit model). The problem is a special case of the unsplittable flow problem on 
line graphs [3J. Our proofs can be considered as proofs of strong NP-hardness 
of either version of interval packing. See Chrobak et al. [8] for details on the 
connection between interval packing and caching. 

Open problems. We prove that general caching with page sizes {1,2,3} is 
strongly NP-hard while general caching with unit page sizes is easily polynomi- 
ally solvable. Closing the gap will definitely contribute to a better understanding 
of the problem: 

Question 1.2. Is general caching also (strongly) NP-hard when page sizes are 
limited to {1,2}? Can caching with page sizes {1,2} be solved in polynomial 
time, at least in the bit or fault model? 

In a broader perspective, the complexity status of general caching is still far 
from being well understood as the best known result on approximation is a 4- 
approximation due to Bar-Noy et al. [4] and there is no result on the hardness 
of approximation. Therefore, better understanding of the complexity of general 
caching remains an important challenge. 

Question 1.3. Is there an algorithm for general caching with an approximation 
ratio better than or even a PTAS? Is general caching APX-hard? 


Outline. The main part of our work - a polynomial-time reduction from inde¬ 
pendent set to caching in the fault model under the optional policy with page 
sizes restricted to {1, 2, 3} - is explained in Section 2 and its validity is proven in 
Section 3. In Section 4, we show how to modify the reduction so that it works for 
the bit model as well. In Section 5, we show how to obtain the hardness results 
also for the forced policy. Finally, we give a self-contained presentation of the 
simple proof of strong NP-hardness for general costs (in fact, only two different 
and polynomial costs are needed) in Appendix A. 


2 Reduction 

The decision problem IndependentSet is well-known to be NP-complete. By 
3Caching(forced) and 3Caching(optional) we denote the decision versions 
of caching under each policy with page sizes restricted to {1,2,3}. 


Problem: IndependentSet 

Instance : A graph G and a number K. 

Question: Is there an independent set of cardinality K in G1 
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Problem: 3CACHING (policy) 

Instance : A universe of pages, a sequence of page requests, numbers C 

and L. For each page p it holds SlZE(p) £ {1, 2, 3}. 

Question: Is there a service under the policy policy of the request se¬ 
quence using the cache of size C with a total fault cost of at 
most LI 

We define 3Caching(fault, policy) to be the problem 3CACHiNG(poZicy) 
with the additional requirement that page costs adhere to the fault model. The 
problem 3Caching(bit, policy) is defined analogously. 

In this section, we describe a polynomial-time reduction from Indepen- 
dentSet to 3Caching(fault,optional). Informally, a set of pages of size two 
and three is associated with each edge and a page of size one is associated with 
each vertex. Each vertex-page is requested only twice while there are many re¬ 
quests on pages associated with edges. The request sequence is designed in such 
a way that the number of vertex-pages that are cached between the two requests 
in the optimal service is equal to the the size of the maximum independent set. 

We now show the request sequence of caching corresponding to the graph 
given in IndependentSet with a parameter H. In the next section, we prove 
that it is possible to set a proper value of H and a proper fault cost limit L such 
that the reduction becomes a valid polynomial-time reduction. 

Reduction 2.1. Let G = (V,E) be the instance of IndependentSet. The 
graph G has n vertices and m edges and there is an arbitrary fixed order of 
edges ei,..., e m . Let H be a parameter bounded by a polynomial function of n. 

A corresponding instance Xq of 3Caching(fault,optional) is an instance 
with the cache size C = 2mH +1 and the total of 6mH + n pages. The structure 
of the pages and the requests sequence is described below. 

Pages. For each vertex v, we have a vertex-page p v of size one. For each edge e, 
there are 6 H edge-pages associated with it that are divided into H groups. The 
ith group consists of six pages af, af, af, &f, f3f, b\ where pages af and fi\ have 
size three and the remaining four pages have size two. 

For a fixed edge e, let a e -pages be all pages af for i = 1,... ,H. Let also 
a-pages be all a e -pages for e = ei,... ,e m . The remaining collections of pages 
(cWpages, a-pages, ...) are defined in a similar fashion. 

Request sequence. The request sequence of Xq is organized in phases and 
blocks. There is one phase for each vertex v £ V, we call such a phase the v- 
phase. There are exactly two requests on each vertex-page p v , one just before 
the beginning of the u-phase and one just after the end of the v-phase; these 
requests do not belong to any phase. The order of phases is arbitrary. In each v- 
phase, there are 2 H adjacent blocks associated with every edge e incident with i>; 
the blocks for different incident edges are ordered arbitrarily. In addition, there 
is one initial block I before all phases and one final block F after all phases. 
Altogether, there are d = 4m H + 2 blocks. 

Let e = {u, u} be an edge, let us assume that the u-phase precedes the 
u-phase. The blocks associated with e in the u-phase are denoted by Rf lt 
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Bf 21 ■ ■ ■ j -Bfi) -B® 2 ) • • •) -Bfr l5 BJj 2 > in this order, and the blocks in the v-phase 
are denoted by Bf 3 , Bf 4 ,..., Bf 3 , Bf 4 ,..., B e H 3 , BJf 4 , in this order. An ex¬ 
ample is given in Fig. 1. 
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Fig. 1. An example of phases, blocks and requests on vertex-pages for a graph with 
three vertices u, v, w and two edges ei = {-u, w}, e 2 = {w,w} when H = 2 


Even though each block is associated with some fixed edge, it contains one or 
more requests to the associated pages for every edge e. In each block, we process 
the edges in the order ei,... ,e m that was fixed above. Pages associated with the 
edge e are requested in two rounds. In each round, we process groups 1,... ,B 
in this order. When processing the itli group of the edge e, we request one or 
more pages of this group, depending on the block we are in. Table 1 determines 
which pages are requested. 

Table 1. Requests associated with an edge e 


Block 

First round 

• 

Second round 

before Bf ^ 
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bt 
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bt 
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Reduction 2.1 is now complete. An example of requests on edge-pages asso¬ 
ciated with one edge e is depicted in Fig. 2. Notice that the order of the pages 
associated with e is the same in all blocks; more precisely, in each block the 
requests on the pages associated with e form a subsequence of 


at <4 at... af af af ... a e H a e H a e H b\ /3f b\ ... b\ ft b\ ... b e H P* H b e H . ( 1 ) 


Preliminaries for the proof. Instead of minimizing the service cost, we max¬ 
imize the savings compared to the service which does not use the cache at all. 
This is clearly equivalent when considering the decision versions of the problems. 

Without loss of generality, we may assume that any page is brought into the 
cache only immediately before some request to that page and removed from the 
cache only after some (possibly different) request to that page; furthermore, at 
the beginning and at the end the cache is empty. I.e., a page may be in the cache 
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only between two consecutive requests to this page, and either remains in the 
cache for the whole interval or not at all. 

Each page of size three is requested only twice in two consecutive blocks, and 
these blocks are distinct for all pages of size three. Thus, a service of edge-pages 
is valid if and only if at each time, at most mH edge-pages are in the cache. It 
is convenient to think of the cache as of mH slots for edge-pages. 

As each vertex-page is requested twice, the savings on the n vertex-pages are 
at most n. Furthermore, a vertex-page can be cached if and only if during the 
phase it never happens that at the same time all slots for edge-pages are full and 
a page of size three is cached. 

Let Sb denote the set of all edge-pages cached at the beginning of the block B 
and let S% be the set of pages in Sb associated with the edge e. We use sb = \Sb\ 
and s€ b = | Sf; | for the sizes of the sets. Each edge-page is requested only in 
a contiguous segment of blocks, once in each block. It follows that the total 
savings on edge-pages are equal to Y1b s b where the sum is over all blocks. 
In particular, the maximal possible savings on the edge-pages are (d — 1 )mH, 
using the fact that Si is empty. We shall show that the maximum savings are 
(d — 1 )mH + K where K is the size of the maximum independent set in G. 



Fig. 2. Requests on all pages associated with the edge e when H = 3. Each column 
represents some block(s). The labelled columns represent the blocks in the heading, 
the first column represents every block before Bf i, the middle column represents every 
block between Bf 2 and Bf 3 , and the last column represents every block after Bf 4 . 
The requests in one column are ordered from top to bottom. 
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Almost-fault model. To understand the reduction, we consider what happens 
if we relax the requirements of the fault model and set the cost of each vertex- 
page to l/(n + 1) instead of 1 as required by the fault model. 

In this scenario, the total savings on vertex-pages are n/(n + 1) < 1 which 
is less than savings incurred by one edge-page. Therefore, edge-pages must be 
served optimally in the optimal service of the whole request sequence. 

In this case, the reduction works already for H = 1. This leads to a quite 
short proof of the strong NP-hardness for general caching and we give this proof 
in Appendix A. Here, we show just the main ideas that are important also for 
the design of our caching instance in the fault and bit models. 

We first prove that for each edge e and each block B ^ I we have a % = 1 (see 
Appendix A). Using this we show below that for each edge e, at least one of the 
pages af and /3f is cached between its two requests. This implies that the set of 
all vertices v such that p v is cached between its two requests is independent. 

For a contradiction, let us assume that for some edge e, neither of the pages 
af and /3f is cached between its two requests. Because pages af and j3f are 
forbidden, there is bf in Sr « and af in Sr = . Somewhere between these two 
blocks Bf 2 and Bf 3 , we must switch from caching bf to caching af. However, 
this is impossible, because the order of requests implies that we would have to 
cache both bf and af at some moment ^see Fig. 3). However, there is no place 
in the cache for such an operation, as s b = 1 for every e! and B I. 



Fig. 3. Pages associated with one edge when H = 1 

In the fault model, the corresponding claim s e B = H does not hold. Instead, 
we prove that the value of s e B cannot change much during the service and when 
we use H large enough, we still get a working reduction. 

3 Proof of Correctness 

In this section, we show that the reduction described in the previous section is 
indeed a reduction from IndependentSet set to 3Caching(fault,optional). 
We prove that there is an independent set of cardinality K in G if and only 
if there is a service of the caching instance Tq with the total savings of at 
least {d — 1 )mH + K. First the easy direction, which holds for any value of the 
parameter H. 



Lemma 3.1. Let G be a graph and Iq the corresponding caching instance from 
Reduction 2.1. Suppose that there is an independent set W of cardinality K in G. 
Then there exists a service oflQ with the total savings of at least (d—l)mH+K. 

Proof. For any edge e, denote e = {u, v} so that the u-phase precedes the v- 
phase. If u G W, we keep all a e -pages, 6 e -pages, /3 e - pages and 6 e -pages in the 
cache from the first to the last request on each page, but we do not cache a e -pages 
and a e -pages at any time. Otherwise, we cache all a e -pages, cWpages, a e -pages 
and 5 e -pages, but do not cache 6 e -pages and /3 e -pages at any time. Fig. 4 shows 
these two cases for the first group of pages. In both cases, at each time at most 
one page associated with each group of each edge is in the cache and the savings 
on those pages are ( d — 1 )mH. We know that the pages fit in the cache because 
of the observations made in Section 2. 

For any v € W, we cache p v between its two requests. To check that this is 
a valid service, observe that if v G W, then during the corresponding phase no 
page of size three is cached. Thus, the page p v always fits in the cache together 
with at most mH pages of size two. □ 
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b\ • • 

PI 
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Fig. 4. The two ways of caching in Lemma 3.1 


We prove the converse in a sequence of lemmata. In Section 4, we will show 
how to reuse the proof for the bit model. To be able to do that, we list explicitly 
all the assumptions about the caching instance that are used in the following 
proofs. 

Properties 3.2. Let Tg be an instance of general caching corresponding to 
a graph G = (V. E ) with n vertices, m edges e\,. .. , e m , the same cache size and 
the same universe of pages as in Reduction 2.1. The request sequence is again 
split into phases, one phase for each vertex. Each phase is again partitioned into 
blocks, there is one initial block I before all phases and one final block F after 
all phases. There is the total of d blocks. 

The instance Tg is required to fulfill the following list of properties: 

(a) Each vertex page p v is requested exactly twice, right before the u-phase and 
right after the v-phase. 

(b) The total savings incurred on edge-pages are equal to s B (summing over 
all blocks). 

(c) For each edge e, there are exactly H pages associated with e requested in /, 
all the a e -pages, and exactly H pages associated with e requested in F. all 
the 6 e -pages. 
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Table 2. Request sequences on all pages associated with an edge e 


Block 

First round • Second round 

before x 


a\ .. .a e H • 



Bf i 

at .. 

6 —0 0 —0 —0 
■ • a.i _i (q Oj+iflj+2 • 

■ ■ a-H • bt. 

..bt 

Bf 2 

at .. 

0 0 0 — 0 — 0 
■ • a.i _i a i ai • 

■a e H • bt. 

..bt 

between and £?i ,3 


at... a e H • bt.. 

■b e H 
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at. 

..at! • bt...bU btPt 

bt+Wt+2 ■ ■ 

■b e H 

BU 

at ■ 

■a e H • bt...bUPtbt 

bt+ibt+2 ■ ■ 

■bn 

after Bf 


• bt...T H 




(d) In each block, pages associated with e\ are requested first, then pages asso¬ 
ciated with e 2 are requested and so on up to e m . 

(e) For each block B and each edge e, all requests on a e -pages and 6 e -pages in B 
precede all requests on a e -pages and 6 e -pages in B. 

(f) Let e = {it, u} be an edge and p an a e -page or (3 e - page. Let B be the first 
block and B the last block where p is requested. Then B and B are either 
both in the u-phase or both in the u-phase. Furthermore, no other page of 
size three associated with e is requested in B, B, or any block between them. 

Lemma 3.3. The instance from Reduction 2.1 satisfies Properties 3.2. 

Proof. All properties (a), (b), (c), (d), (f) follow directly from Reduction 2.1 and 
the subsequent observations. To prove (e), recall that the pages associated with 
an edge e requested in a particular block always follow the ordering (1). We need 
to verify that when the page a | is requested, no page of for j < i is requested 
and that when the page b e t is requested, no a e -page and no page bt- for j < i 
is requested. This can be seen easily when we explicitly write down the request 
sequences for each kind of block, see Table 2. □ 

For the following claims, let Tg be an instance fulfilling Properties 3.2. We 
fix a service of Tg with the total savings of at least (d — 1 )mH. 

Let B be the set of all blocks and B the set of all blocks except for the initial 
and final one. For a block B, we denote the block immediately following it by B'. 

We define two useful values characterizing the service for the block B: Sb = 
mH — sb (the number of free slots for edge-pages at the start of the service of 
the block) and yj) = | s e B , — s e B | (the change of the number of slots occupied by 
pages associated with e after requests from this block are served). 

The first easy lemma says that only a small number of blocks can start with 
some free slots in the cache. 

Lemma 3.4. When summing over all blocks except for the initial one 

T,S B <n. 
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Proof. Using the property (b) and sj = 0, the savings on edge-pages are 
E SB = {d - 1 )mH - E $b- 

The total savings are assumed to be at least (d— 1 )mH. Due to the property (a), 
the savings on vertex-pages are at most n. Claim of the lemma follows. □ 

The second lemma states that the number of slots occupied by pages associ¬ 
ated with a given edge does not change much during the whole service. 

Lemma 3.5. For each edge e £ E, 

E 7 B- 6n - 

bFb 

Proof. Let us use the notation S^ k = S B U • • • U S e B and s^ k = Sff' . First, we 
shall prove for each k < m 

E|4*-4 fe ^ 3n - ( 2 ) 

BeB 

Let V denote the set of all blocks B from B satisfying .s|j, — s^ k > 0 and let 
A f denote the set of all the remaining blocks from B. 

As a consequence of the property (c), we get sf, k £ [kH — 6i>,kH] and 
s^f k £ [kH — 5f , kH]. So we obtain the inequality 

sf fc - sf k > -S F . (3) 

We claim s^ k — s^ k < 6 b for each B £ B. We assume for a contradiction 
— sjf > 6 b for some block B. We use the property (d). Then after processing 
the edge eu in B, the number of edge-pages in the cache is (sb ~> sb + 
6b = mH. But more than mH edge-pages in the cache means a contradiction. 
The summation over all blocks from V gives us the first bound 

E { S B' - s B k ) < E^< n. (4) 

Bev Bev 

Using the fact V U M = B and (3), we have 

E ( <k <k\ , ( <k <k\ <k <k - x 

^S B , S B J + 2_^ S B ) ~ S F S I' — 

Bev BeM 

together with (4), we obtain the second bound 

-£(4?-4‘)<E(4‘-4‘)+*-<2». 

BeM bgv 


( 5 ) 
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Combining the bounds (4) and (5), we prove (2) 

E l <k <k ( <k <k\ ( <k <k\ , . 0 o 

\ S B' - s b ={ S B' ~ S B ) - \ S B' ~ S B ) <n + 2n = 3 n. 
b^b b^v Be N 

For the edge ei, the claim of this lemma is weaker than (2) because jg = 
I4 1 , — 4 1 ]. Proving our lemma for e*, when k > 1 is just a matter of using (2) 
for k — 1 and k together with the formula \x — y\ < |x| + \y\: 

7b < l^\ s B>- s B +l^\ s B> - s b <3n + 3n = 6n □ 

BeB BeB BeB 

For the rest of the proof, we set H = 6 mn + 3n + 1. This enables us to show 
that the fixed service must cache some of the pages of size three. 

Lemma 3.6. For each edge e £ E, there is a block B such that some a e -page 
or /3 e -page is in Sb and 6 b = 0. 

Proof. Fix an edge e = efc. For each block B, we define 

£b = number of a e -pages and /3 e -pages in Sb- 

Observe that due to the property (f), Sb is always one or zero. We use 
a potential function 

d>B = number of a e -pages and b e -pages in Sb- 

Because there are only a-pages in the initial block and only 5-pages in the 
final block (property (c)), we know 

( I>ff = 0 and <Ff > H — 6f■ (6) 

Now we bound the increase of the potential function as 

fc-i 

&B' — < 6b + 

i =i 

To justify this bound, we fix a block B and look at the cache state after 
requests on edges e\, ..., efc_i are processed. How many free slots there can be 
in the cache? There are initial 6b free slots in the beginning of the block B, and 
the number of free slots can be further increased when the number of pages in 
the cache associated with e\, ... ,ef--i decreases. This increase can be naturally 
bounded by Yle=i 7b • Therefore, the number of free slots in the cache is at most 

8 B + Eth e B e - 

Because of the property (e), the number of cached a e -pages and 5 e -pages 
can only increase by using the free cache space or caching new pages instead of 
a e -pages and /3 e -pages. We already bounded the number of free slots and eb is 
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a natural bound for the increase gained on cWpages and /3 e -pages. Thus, the 
bound (7) is correct. 

Summing (7) over all B £ B, we have 


k-1 


BGB 


bgB 


k -1 


1=1 


@f ~ &r = ^2 B ' ~ (^ B + 7/3 + £ b 

bgb 

which we combine with (6) into 

H~8 f < Y,( 5b + Y,^b + £b 


e=i 


and use Lemmata 3.4 and 3.5 to bound Y^ £ b as 


/ k— 1 

Y £ b > H - - Y f ' 5b + Y 

BGB BgB ' ^ =1 

>H — n — n — (k — l)6n 
> H — 6 mn — 2n = n + 1. 


6/ 
7 b 


As there is at most one page of size three requested in each block (prop¬ 
erty (f)), the inequality Y1 £ b > n + 1 implies that there are at least n+1 blocks 
where an a e -page or a /3 e -page is cached. At most n blocks have 5b non-zero 
(Lemma 3.4); we are done. □ 

We are ready to complete the proof of the harder direction. 

Lemma 3.7. Suppose that there exists a service of 7 g with the total savings 
of at least ( d — 1 )mH + K. Then the graph G has an independent set W of 
cardinality I\. 

Proof. Let If be a set of K vertices such that the corresponding page p v is 
cached between its two requests. (There are at least K of them because the 
maximal savings on edge-pages are ( d — 1 )mH.) 

Consider an arbitrary edge e = {rt, u}. Due to Lemma 3.6, there exists 
a block B such that 5b = 0 and some cC-page or /3 e -page is cached in the 
beginning of the block. This block B is either in the u-phase or in the u-phase, 
because of the statement of the property (f). This means that at least one of the 
two pages p u and p v is not cached between its two requests, because the cache 
is full. As a consequence, the set W is indeed independent. □ 

The value of H was set to 6nm + 3 n + 1, therefore Reduction 2.1 is indeed 
polynomial. Lemmata 3.1, 3.3 and 3.7 together imply that there is an indepen¬ 
dent set of cardinality K in G if and only if there is a service of the instance 
Ig with the total savings of at least (d— 1 )mH + K. We showed that the prob¬ 
lem 3CACHING (fault,optional) is indeed strongly NP-hard. 
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4 Bit Model 

In this section, we show how to modify the proof for the fault model from the 
previous sections so that it works as a proof for the bit model as well. 

Reduction 4.1. Let G be a graph and Xq the corresponding instance of the 
problem 3 Caching(fault,optional) from Reduction 2.1. Then the modified 
instance Xq is an instance of 3 Caching(bit,optional) with the same cache 
size and the same set of pages with the same sizes. 

The structure of phases and requests on vertex-pages is also preserved. The 
blocks from X q are also used, but between each pair of consecutive blocks there 
are five new blocks inserted. Let B and B' be two consecutive blocks. Between 
B and B' we insert five new blocks -B(i),..., £?( 5 ) with the following requests 

— B^y do not request anything; 

— £?( 2 ): request all pages of size two that are requested both in B and B'\ 

— B ( 3 ): request a page (there is either one or none) of size three that is requested 
both in B and B 1 

— 5 ( 4 ): request all pages of size two that are requested both in B and B'\ 

— B( 5 y. do not request anything. 

See Fig. 5 for an example. In each new block, the order of chosen requests is 
the same as in B (which is the same as in B', as both follow the same ordering 
of edges (1)). The new instance has the total of d — d + 5 (d— 1) = 6d — 5 blocks. 
This time we prove that the maximal total savings are {d— 1 )mH + K where K 
is the cardinality of the maximum independent set in G. 

B B' B R (1 ) B ( 2 ) £( 3 ) - 6 ( 4 ) B' 

a • -• a •-•-•-• 

/3 •-• ** 13 • -•-. 

Fig. 5. The modification of the instance for a page a of size two and a page (3 of size 
three 

Lemma 4.2. Suppose that the graph G has an independent set W of cardinal¬ 
ity K. Then there exists a service of the modified instance Xq with the total 
savings of at least (d — 1 )mH + K. 

Proof. We consider the service of the original instance Iq described in the proof 
of Lemma 3.1 and modify it so it becomes a service of the modified instance. 

In the new service, vertex-pages are served the same way as in the original 
service. The savings on vertex-pages are thus again K . 

For each pair of consecutive blocks B and B ', each page kept in the cache 
between B and B' in the original service is kept in the cache in the new service for 
the whole time between B and B' (it spans over seven blocks now). For a page of 
size two, savings of two are incurred three times. For a page of size three, savings 
of three are incurred twice. On each page in the new service we save six instead 
of one. Therefore, the total savings on edge-pages are 6 (d— 1 )mH = {d— 1 )mH. 
The total savings are (d — 1 )mH + K. □ 
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Lemma 4.3. Suppose that there exists a service of the modified instance Xq 
with the total savings of at least (d — 1 )mH + K. Then the graph G has an 
independent set W of cardinality K. 

Proof. This lemma is the same as Lemma 3.7. We just need to verify that the 
modified instance fulfills Properties 3.2. 

To prove that the property (b) is preserved, we observe that each two consec¬ 
utive requests on a page of size two are separated by exactly one block where the 
page is not requested. Consequently, when there are savings of two on a request 
on the page of size two, we assign savings of one to the block where the savings 
were incurred and savings of one to the previous block. Similarly, each pair of 
consecutive requests on a page of size three is separated by exactly two blocks 
where the page is not requested. When there are savings of three on a request 
on the page of size three, we assign savings of one to the block where the savings 
were incurred and savings of one to each of the two previous blocks. As a con¬ 
sequence, the total savings gained on edge-pages may indeed be computed as 
Y s b- 

The property (a) is preserved because the requests on vertex-pages are the 
same in both instances. The property (c) is preserved because the initial and 
final blocks are the same in both instances. 

Each sequence of requests in a block of the modified instance Iq is either 
the same or a subsequence of the sequence in a block in the original instance. 
Therefore, the properties (d), (e) and (f) are preserved as well. □ 

Lemmata 4.2 and 4.3 imply that we have a valid polynomial-time reduction 
and so the problem 3Caching (bit,optional) is strongly NP-hard. 

5 Forced Policy 

Theorem 5.1. Both the problem 3 Caching (fault, forced) and the problem 
3Caching(bit, forced) are strongly NP-hard. 

Proof. For both the fault model and the bit model, we show a polynomial¬ 
time reduction from caching with optional policy to the corresponding variant 
of caching with the forced policy. Let us have an instance of caching with the 
optional policy with the cache size C and the request sequence p = rq ... r„; let 
M be the maximal size of a page in p (in our previous reductions, M = 3). 

We create an instance of caching with the forced policy. The cache size is 
C' = C + M. The request sequence is p' = n q-[ r^q-A ■ ■ ■ r n q n where qi,... , q n are 
requests to n different pages that do not appear in p and have size M. The costs 
of the new pages are one in the fault model and M in the bit model. 

We claim that there is a service of the optional instance with savings S if 
and only if there is a service of the forced instance with savings S. 

=>■ We serve the requests on original pages the same way as in the optional 
instance. The cache is larger by M which is the size of the largest page. Thus, 
pages that were not loaded into the cache because of the optional policy fit in 
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there; we can load them and immediately evict them. New pages fit into the 
cache as well and we also load them and immediately evict them. This way we 
have the same savings as in the optional instance. 

4= We construct a service for the optional instance: For each i, when serving 
r, we consider the evictions done when serving r t and qi of the forced instance. 
If a page requested before ry is evicted, we evict it as well. If a page requested 
by r-i is evicted, we do not cache it at all. Because the page requested by qi has 
size M, the original pages occupy at most C slots in the cache when q, is served. 
This way we obtain a service of the optional instance with the same savings. 

Using the strong NP-hardness of the problems 3CACHING (fault, optional) 
and 3Caching(bit,optional) proven in Sections 2 and 3 and the observation 
that the reduction preserves the maximal size of a page, we obtain the strong 
NP-hardness of 3Caching(fault,forced) and 3Caching(bit,forced). □ 
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A The Simple Proof 

In this appendix, we present a simple variant of the proof for the almost-fault 
model with two distinct costs. This completes the sketch of the proof presented at 
the end of Section 2. We present it with a complete description of the simplified 
reduction, so that it can be read independently of the rest of the paper. This 
appendix can therefore serve as a short proof of the hardness of general caching. 

Theorem A.l. General caching is strongly NP-hard, even in the case when 
page sizes are limited to {1,2,3} and there are only two distinct fault costs. 

We prove the theorem for the optional policy. It is easy to obtain the theorem 
also for the forced policy the same way as in the proof of Theorem 5.1. 


The Reduction 


The reduction described here will be equivalent to Reduction 2.1 with H = 1 
and the fault cost of each vertex-page set to l/(n + 1). 

Suppose we have a graph G = ( V ., E) with n nodes and m edges. We con¬ 
struct an instance of general caching whose optimal solution encodes a maximum 
independent set in G. Fix an arbitrary numbering of edges ei,... , e m . 

The cache size is C = 2m + 1. For each vertex v, we have a vertex-page p v 
with size one and cost l/(n + 1). For each edge e, we have six associated edge- 
pages a e , a e , a e , b e , b e , /3 e ; all have cost one, pages a e ,/3 e have size three and the 
remaining pages have size two. 

The request sequence is organized in phases and blocks. There is one phase 
for each vertex. In each phase, there are two adjacent blocks associated with 
every edge e incident with v; the incident edges are processed in an arbitrary 
order. In addition, there is one initial block I before all phases and one final 
block F after all phases. Altogether, there are d = 4m+ 2 blocks. There are four 
blocks associated with each edge e; denote them Bf, Bf, Bf, Bf, in the order as 
they appear in the request sequence. 

For each v £ V, the associated page p v is requested exactly twice, right 
before the beginning of the r;-phase and right after the end of the u-phase; these 
requests do not belong to any phase. An example of the structure of phases and 
blocks is given in Fig. 6. 


_ u-pliase _ 

B f f 

B ? 


w;-phase 

B f 

Bf 

Bf 

Bf 


_ w-phase _ 

Bf 

Bf 


Pu Pv Pw 



Fig. 6. An example of phases, blocks and requests on vertex-pages for a graph with 
three vertices u, v, w and two edges ei = {-u, w}, ei = {w,w} when H = 2 


Even though each block is associated with some fixed edge, it contains one or 
more requests to the associated pages for every edge e. In each block, we process 
the edges ei,..., e m in this order. For each edge e, we make one or more requests 
to the associated pages as follows. If the current block is: 
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• before Bf: request a e ; 

• Bf: request a e , a e , and b e ; 

• Bf: request a e , a e , and 6 e ; 

• after Bf and before Bf: request a e and b e \ 

• Bf: request a e , b e , and /3 e ; 

• B|: request a e , /3 e , and 6 e ; 

• after Bf: request b e . 

Fig. 7 shows an example of the requests on edge-pages associated with one 
particular edge. 

ne pe pe pe 

- -Dl tf 2 - - 


b e •-•-•-• 

b e •-• 

Fig. 7. Requests on all pages associated with the edge e. Each column represents some 
block(s). The four labelled columns represent the blocks in the heading, the first column 
represents every block before Bf, the middle column represents every block between 
Bf and Bf, and the last column represents every block after Bf. The requests in one 
column are ordered from top to bottom. 


Proof of Correctness 

Instead of minimizing the service cost, we maximize the savings compared to 
the service which does not use the cache at all. This is clearly equivalent when 
considering the decision version of the problem. 

Without loss of generality, we assume that any page is brought into the cache 
only immediately before a request to that page and removed from the cache only 
immediately after a request to that page; furthermore, at the beginning and at 
the end the cache is empty. I.e., a page may be in the cache only between two 
consecutive requests to this page, and either it is in the cache for the whole 
interval or not at all. 

Each page of size three is requested only twice in two consecutive blocks, and 
these blocks are distinct for all pages of size three. Thus, a service of edge-pages 
is valid if and only if at each time, at most m edge-pages are in the cache. It is 
thus convenient to think of the cache as of m slots for edge-pages. 

Each vertex-page is requested twice. Thus, the savings on the n vertex-pages 
are at most n/(n+l) < 1. Since all edge-pages have cost one, the optimal 
service must serve them optimally. Furthermore, a vertex-page can be cached if 
and only if during the phase it never happens that at the same time all slots for 
edge-pages are full and a page of size three is cached. 
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Let Sb denote the set of all edge-pages cached at the beginning of the block B 
and let sb = |Sb|. Now observe that each edge-page is requested only in a 
contiguous segment of blocks, once in each block. It follows that the total savings 
on edge-pages are equal to Sb w h ere the sum is over all blocks. In particular, 
the maximal possible savings on the edge-pages are (d— 1 )m, using the fact that 
Si is empty. 

We prove that there is a service with the total savings of at least ( d — 1 )m + 
K/(n + 1) if and only if there is an independent set of size K in G. First the 
easy direction. 

Lemma A.2. Suppose that G has an independent set W of size K. Then there 
exists a service with the total savings of (d — 1 )m + K/{n + 1). 

Proof. For any e, denote e = uv so that u precedes v in the ordering of phases. 
If u € W, we keep a e 1 b e , b e and /3 e in the cache from the first to the last request 
on each page, and we do not cache a e and a e at any time. Otherwise we cache 
b e ,a e , a e and a e , and do not cache b e and (3 e at any time. In both cases, at each 
time at most one page associated with e is in the cache and the savings on those 
pages is (d — 1 )m. See Fig. 8 for an illustration. 


jDe oe ne ne oe ne oe oe 

- r>l JD 2 - r > 3 ±> 4 - - JD l t>2 - £>3 &4 - 



Fig. 8. The two ways of caching in Lemma A.2 


For any v € W, we cache p v between its two requests. To check that this is 
a valid service, observe that if v £ W, then during the corresponding phase no 
page of size three is cached. Thus the page p v always fits in the cache together 
with at most m pages of size two. □ 

Now we prove the converse in a sequence of claims. Fix a valid service with 
savings at least (d — l)m. For a block B, let B' denote the following block. 

Claim A.3. For any block B, with the exception of B = /, we have sb = rn. 

Proof. For each B ^ I we have Sb < m. Because sj = 0, the total savings on 
edge-pages are J2b s b < (d— l)m. We need an equality. □ 

We now prove that each edge occupies exactly one slot during the service. 

Claim A.4. For any block B I and for any e, Sb contains exactly one page 
associated with e. 
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Proof. Let us use the notation S ^ = S*jf U • • • U S e ^ and = 
shall prove for each k < m 

<k 7 

S~B = k - 


Q <k 


First, we 
( 8 ) 


This is true for B = F, as only the m edge-pages b e can be cached there, and 
by the previous claim all of them are indeed cached. Similarly for B = I' (i.e., 
immediately following the initial block). 

If (8) is not true, then for some k and B ^ {/, F} we have s^ k < . Then 

after processing the edge e* in the block B we have in the cache all the pages in 
( Sb \ Spj k ) U Sb*- Their number is (to — s^ fc ) + > to, a contradiction. 

The statement of the claim is an immediate consequence of (8). □ 


Claim A.5. For any edge e, at least one of the pages a e and /3 e is cached 
between its two requests. 


Proof. Assume that none of the two pages is cached. It follows from the previous 
claim that b e £ Sb%, as at this point a e and b e are the only pages associated 
with e that can be cached. Similarly, a e £ Sb%- 

It follows that there exists a block B between B\ and £?| such that Sb 
contains the page b e and Sb> contains the page a e . However, in B, the page a e 
is requested before the page b e . Thus at the point between the two requests, the 
cache contains two pages associated with e, plus one page associated with every 
other edge, the total of m + 1 pages, a contradiction. □ 


Now we are ready to complete this direction. 


Lemma A.6. Suppose that there exists a valid service with the total savings of 
(d — l)?n + K/(n + 1). Then G has an independent set W of size K. 

Proof. Let W be the set of all v such that p v is cached between its two requests. 
The total savings imply that IT| = K. 

Now we claim that W is independent. Suppose not, let e = uv be an edge 
with u, v £ W. Then p u and p v are cached in the corresponding phases. Thus 
neither a e nor j3 e can be cached, since together with other to. — 1 requests of size 
2 associated with the remaining edges, the cache size needed would be 2 to. + 2. 
However, this contradicts the last claim. □ 


Lemmata A.2 and A.6 together show that we constructed a valid polynomial¬ 
time reduction from the problem of independent set to general caching. There¬ 
fore, Theorem A.l is proven. 



